8 research outputs found

    Naive Bayes Classification in The Question and Answering System

    Get PDF
    Abstract—Question and answering (QA) system is a system to answer question based on collections of unstructured text or in the form of human language. In general, QA system consists of four stages, i.e. question analysis, documents selection, passage retrieval and answer extraction. In this study we added two processes i.e. classifying documents and classifying passage. We use Naïve Bayes for classification, Dynamic Passage Partitioning for finding answer and Lucene for document selection. The experiment was done using 100 questions from 3000 documents related to the disease and the results were compared with a system that does not use the classification process. From the test results, the system works best with the use of 10 of the most relevant documents, 5 passage with the highest score and 10 answer the closest distance. Mean Reciprocal Rank (MMR) value for QA system with classification is 0.41960 which is 4.9% better than MRR value for QA system without classificatio

    Sketch-Based Image Retrieval with Histogram of Oriented Gradients and Hierarchical Centroid Methods

    No full text
    Searching images from digital image dataset can be done using sketch-based image retrieval that performs retrieval based on the similarity between dataset images and sketch image input. Preprocessing is done by using Canny Edge Detection to detect edges of dataset images. Feature extraction will be done using Histogram of Oriented Gradients and Hierarchical Centroid on the sketch image and all the preprocessed dataset images. The features distance between sketch image and all dataset images is calculated by Euclidean Distance. Dataset images used in the test consist of 10 classes. The test results show Histogram of Oriented Gradients, Hierarchical Centroid, and combination of both methods with low and high threshold of 0.05 and 0.5 have average precision and recall values of 90.8 % and 13.45 %, 70 % and 10.64 %, 91.4 % and 13.58 %. The average precision and recall values with low and high threshold of 0.01 and 0.1, 0.3 and 0.7 are 87.2 % and 13.19 %, 86.7 % and 12.57 %. Combination of the Histogram of Oriented Gradients and Hierarchical Centroid methods with low and high threshold of 0.05 and 0.5 produce better retrieval results than using the method individually or using other low and high threshold

    Automatic Essay Scoring in E-learning System Using LSA Method with N-Gram Feature for Bahasa Indonesia

    No full text
    In the world of education, e-learning system is a system that can be used to support the educational process. E-learning system is usually used by educators to learners in evaluating learning outcomes. In the process of evaluating learning outcomes in the e-learning system, the form type of exam questions that are often used are multiple choice and short stuffing. For exam questions in the form of essays are rarely used in the evaluation process of educational because of the difference in the subjectivity and time consuming in the assessment process. In this design aims to create an automatic essay scoring feature on e-learning system that can be used to support the learning process. The method used in automatic essay scoring is Latent Semantic Analysis (LSA) with n-gram feature. The evaluation results of the design features automatic essay scoring showed that the accuracy of the average achieved in the amount of 78.65 %, 58.89 %, 14.91 %, 71.37 %, 64.49 % in the LSA unigram, bigram, trigram, unigram + bigram, unigram + bigram + trigram

    Automatic Essay Scoring in E-learning System Using LSA Method with N-Gram Feature for Bahasa Indonesia

    No full text
    In the world of education, e-learning system is a system that can be used to support the educational process. E-learning system is usually used by educators to learners in evaluating learning outcomes. In the process of evaluating learning outcomes in the e-learning system, the form type of exam questions that are often used are multiple choice and short stuffing. For exam questions in the form of essays are rarely used in the evaluation process of educational because of the difference in the subjectivity and time consuming in the assessment process. In this design aims to create an automatic essay scoring feature on e-learning system that can be used to support the learning process. The method used in automatic essay scoring is Latent Semantic Analysis (LSA) with n-gram feature. The evaluation results of the design features automatic essay scoring showed that the accuracy of the average achieved in the amount of 78.65 %, 58.89 %, 14.91 %, 71.37 %, 64.49 % in the LSA unigram, bigram, trigram, unigram + bigram, unigram + bigram + trigram

    Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method

    No full text
    Any mistake in writing of a document will cause the information to be told falsely. These days, most of the document is written with a computer. For that reason, spelling correction is needed to solve any writing mistakes. This design process discuss about the making of spelling correction for document text in Indonesian language with document's text as its input and a .txt file as its output. For the realization, 5 000 news articles have been used as training data. Methods used includes Finite State Automata (FSA), Levenshtein distance, and N-gram. The results of this designing process are shown by perplexity evaluation, correction hit rate and false positive rate. Perplexity with the smallest value is a unigram with value 1.14. On the other hand, the highest percentage of correction hit rate is bigram and trigram with value 71.20 %, but bigram is superior in processing time average which is 01:21.23 min. The false positive rate of unigram, bigram, and trigram has the same percentage which is 4.15 %. Due to the disadvantages at using FSA method, modification is done and produce bigram's correction hit rate as high as 85.44 %

    Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method

    No full text
    Any mistake in writing of a document will cause the information to be told falsely. These days, most of the document is written with a computer. For that reason, spelling correction is needed to solve any writing mistakes. This design process discuss about the making of spelling correction for document text in Indonesian language with document's text as its input and a .txt file as its output. For the realization, 5 000 news articles have been used as training data. Methods used includes Finite State Automata (FSA), Levenshtein distance, and N-gram. The results of this designing process are shown by perplexity evaluation, correction hit rate and false positive rate. Perplexity with the smallest value is a unigram with value 1.14. On the other hand, the highest percentage of correction hit rate is bigram and trigram with value 71.20 %, but bigram is superior in processing time average which is 01:21.23 min. The false positive rate of unigram, bigram, and trigram has the same percentage which is 4.15 %. Due to the disadvantages at using FSA method, modification is done and produce bigram's correction hit rate as high as 85.44 %

    Spelling Correction Application with Damerau-Levenshtein Distance to Help Teachers Examine Typographical Error in Exam Test Scripts

    No full text
    This research was intended to create Spelling Correction Application to help teachers examine questions scripts with the capability to found typographical error and give suggestion for non-real word error. This application is built with simple Damerau-Levenshtein Distance method to detect errors and give word suggestions from the typo word. This application can be used by the teacher to examine documents in the form of short answer, essay and multiple choices then save them back in the form of original documents. This application is built using a dictionary lookup consist of 41 312 words in Indonesian. The first test result is the application can detect non-real word errors from 50 sentences that have non-real word error in each sentence and produce an accuracy of 88 %. The second test is try to detect typographical error in exam test script that consist of 15 sample questions, consisting of five essay questions, five short answer, and five multiple choices
    corecore